Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

ABSTRACT

Techniques for obtaining geographically-relevant product inventory information, in real-time, from heterogeneous data sources are described. Product inventory information, including the volume of available products in specific geographical locations, is obtained from at least three different sources. First, one or more data feeds may be received. Second, a data obtaining module uses one or more APIs to obtain product inventory information from one or more third-party inventory management systems. Finally, a structured data mining module uses a web crawler, at the direction of a crawler configuration, to systematically obtain product inventory information from various third-party websites. Accordingly, a user&#39;s search query is processed to provide geographically relevant product inventory information in near real time.

RELATED APPLICATIONS

The present application is a continuation of U.S. patent applicationSer. No. 13/366,962, filed Feb. 6, 2012, which claims the benefit ofpriority, under 35 U.S.C. § 119(e), to U.S. Provisional PatentApplication Ser. No. 61/439,724, entitled, “Methods and Systems forAutomatically Obtaining Real-Time, Geographically-Relevant ProductInformation From Heterogeneous Sources, and Enhancing and Presenting theProduct Information”, filed on Feb. 4, 2011, which is by way ofreference incorporated herein in its entirety.

TECHNICAL FIELD

The present disclosure generally relates to data processing techniquesfor obtaining from disparate and heterogeneous sources, real-time,geographically-relevant information concerning products and theiravailability.

BACKGROUND

The Internet and the World Wide Web have given rise to a wide variety ofon-line retailers that operate virtual stores from which consumers canpurchase products (i.e., merchandise, or goods) as well as services.Although the popularity of these on-line retail sites is clearlyevidenced by their increasing sales, for a variety of reasons, someconsumers may still prefer to purchase products and services in a moreconventional manner—i.e., via a brick-and-mortar store.

DESCRIPTION OF THE DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings:

FIG. 1 is an example of a web page from which various elements ofinformation are retrieved by a crawler operating in conjunction with acrawler configuration generated with a crawler configurationapplication, according to some embodiments of the invention;

FIG. 2 is a block diagram illustrating an example of a cache key in theform of a three-tuple with a zip code, offer or product code, and offervariant, consistent with some embodiments of the invention;

FIG. 3 is a block diagram illustrating the data source and data flowsthat occur for populating a database with product inventory information,according to some embodiments of the invention; and

FIG. 4 is a block diagram of a machine in the form of a computing devicewithin which a set of instructions, for causing the machine to performany one or more of the methodologies discussed herein, may be executed.

DETAILED DESCRIPTION

The present disclosure describes data processing techniques forobtaining from disparate and heterogeneous sources, real-time,geographically-relevant information concerning products and theiravailability. In the following description, for purposes of explanation,numerous specific details are set forth in order to provide a thoroughunderstanding of the various aspects of different embodiments of thepresent invention. It will be evident, however, to one skilled in theart, that the present invention may be practiced without all of thespecific details.

Embodiments of the present invention involve a set of sophisticated andcomputer-implemented automated tools and processes for obtaining currentdata about products and their availability from a wide variety of datasources, such as web sites, network-connected databases, inventorysystems, and so forth. In particular, the systems and methods describedherein facilitate obtaining and presenting in near real-time,geographically-relevant data concerning products and their availability,such that a potential consumer can perform a web-based search to locatea product, with its current inventory information, at a retail store ina particular geographical area. For example, an automated process (e.g.,a crawler) can be configured to obtain product information from avariety of web sites. Alternatively, an external database may beaccessed via an application programming interface (API). In any case,once the data is obtained, this data is enhanced and stored in a localdatabase. The data can then be presented to potential consumers inresponse to a consumer browsing or searching for relevant products andspecifying a particular location. As there are many stages involved inthe overall process of obtaining, enhancing and presenting this productdata, the following description of the inventive subject matter ispresented in sections, which loosely correlate with the various stages.

Data Acquisition—Structured Data Mining

Consistent with some embodiments of the inventive subject matter, datafrom a wide variety of sources is obtained via a system and method ofstructured data mining. The system and related automated processes thatfacilitate the structured data mining consist primarily of twocomponents. The first is a web-based application (referred to herein asthe crawler construction kit, or CCK) used to configure one or moreproprietary crawlers. A crawler (sometimes referred to as a web crawler,or bot) is an automated computer program process that operates to browsethe Internet or World Wide Web in a methodical manner, gathering orobtaining data in an orderly fashion. The CCK is a web-based applicationthat allows its user to browse a retailer's web site and quicklyestablish a crawler configuration—e.g., a set of automated steps—that isrequired to obtain some item of information (e.g., the color, price,quantity available, etc.) about a particular product being offered via aparticular retailer. Accordingly, using the CCK, a user can create acrawler configuration (e.g., a set of interpretable, or executable,instructions), which is then used to direct a crawler to perform aparticular set of operations to obtain a particular set or item of data,and thereby populate a database with product inventory informationobtained automatically from various websites. This type of technique isgenerally referred to as web scraping.

With some embodiments, the CCK provides a user with a web-based set oftools for selecting and tagging various elements of a web page thatcorrespond with elements of product inventory information that can beautomatically extracted by an automated crawler. For instance, with someembodiments, the CCK application enables a user to manipulate a cursorwith a pointing device to interact with elements on a web page, forexample, by clicking, selecting, dragging, etc. When a particular itemor element of information displayed on the web pages has been selected,the source document underlying the web page is analyzed to identifyinformation that might be used by a crawler to extract or obtain theelement of information. This information is then automatically populatedin a crawler configuration (e.g., a configuration file) for a particularcrawler that will later be used to periodically obtain the set or itemof information. In some cases, the CCK application may prompt the userto select various options or settings for use in obtaining a particularelement of information. Additionally, as discussed briefly below, theuser may opt to open a separate window, pane or similar user interfaceelement in which the user can directly edit a snippet of code forinclusion with the crawler configuration for the specified crawler. Forinstance, in certain scenarios, a user may be required to customize acrawler configuration to direct a crawler to perform some specializedoperation(s) that are required to obtain a particular element ofinformation.

Once extracted, the data may be manipulated or enhanced and theninserted into a database and used in the processing of users' queries,and presentation in search results, etc. With some embodiments,normalizing the information so that common characteristics can becompared with a common nomenclature may enhance the information.Additionally, with some embodiments, specific products may becategorized and classified into a proprietary hierarchy. Similarly, withsome embodiments, products may be assigned to proprietary productidentifiers, where common, publicly available SKU's (or otheridentifiers) are not used.

As illustrated in FIG. 1, an example user interface for a merchantwebsite is displayed. Using the CCK tool, a user can select variouselements of information presented in the web page, and specifyconfiguration information for use by a crawler in obtaining the elementsof information. For instance, any of the following elements ofinformation may be selected with the CCK tool for purposes ofcustomizing or configuring the crawler: the descriptive name of theproduct 10, the users' ratings and reviews 12, the text within thedetails tab 14, the information presented within the fit tab 16, theshipping information 18, the color information 20, the size information22, the picture 24, the pricing information 26, and the item number 28.In addition, with some embodiments, the crawler may identify a volume oramount of a particular product that is available within a particulargeographical location.

The second component that is part of the structured data mining systemis a suite of crawlers that are configured to use a crawlerconfiguration created by the web-based CCK application. In contrast toconventional web crawlers, crawlers consistent with embodiments of theinvention are configured to be driven by the crawler configurations thatare created by the CCK, which can be quite complex. As a result, thecrawlers can be configured to crawl web sites and obtain data that manyconventional automated crawlers would have no way of accessing. As theremay be many different crawlers in the suite of crawlers, the crawlerconfiguration may specify the particular crawler for which theconfiguration is to be used.

Consistent with some embodiments of the invention, the web-based CCKapplication enables an approach to describing how to select desiredinformation from various sources (HTML, XML, JSON, javascript, etc.).Fundamentally, the web-based CCK application defines sets of what arereferred to herein as selectors, where each selector describes how toextract a single item of information (e.g. product title, retail price,product image URL, description, etc). Each selector is in essence a setof steps, or a pipeline, that describes a series of operations that areto be performed in order to request and then extract the desiredinformation from a web server, for insertion into a database.

To establish a pipeline, the following steps or stages are followed.

-   -   1. Select elements from data source    -   2. Apply filters to the selected elements    -   3. Apply filters to the values of the selected elements    -   4. Apply custom treatments to values.

Each of the first three stages have several built-in mechanisms, but inmost cases the user can, if necessary, fall back to writing code (e.g.,python code) directly in the user interface of the web-based CCKapplication in order to define custom behaviors. For instance, theweb-based CCK application includes a code editing module that enables auser to define a script or section of executable code, which can beexecuted to perform a customized operation that is not easily definableby the automated tools of the web-based CCK application. This code canbe arbitrarily complex, so for example it can open new networkresources, download additional web pages or assets, use third partylibraries, and so on. Accordingly, the web-based CCK application enablesa user to very quickly automate a crawler to retrieve an item ofinformation from a web site, by generating a crawler configuration, andif necessary, customizing the behavior of the crawler to perform morecomplex operations. The custom treatments in stage four (4) are allbuilt-in optimizations for common cases that are frequently encounteredin this problem domain (e.g. handling of currency in prices).

In addition to configuration, the web-based CCK application alsosupports live testing of configurations as well as automated validationof crawler configurations. Accordingly, a user attempting to generate acrawler configuration to obtain a particular item of information about aproduct can select to test the crawler configuration in real-time, andobserve how the crawler, controlled by the crawler configuration,performs the operations. This allows the user to tweak or modify theconfiguration to obtain the required data item.

Real-Time Product Availability Lookup (RTPAL)

In addition to using a crawler to obtain information, with someembodiments, a more formal or dedicated process might also be used.Whereas a crawler can obtain information from websites when the operatorof the website does not provide a publically available API, the RTPALgenerally relies on the existence of formal, publically accessibleinventory systems to obtain product inventory information. For instance,with some embodiments, a Real-Time Product Availability Lookup (RTPAL)system is used to query external inventory systems. The RTPAL systemconsists primarily of three components. The first component is aframework for retrieval and caching of information from individualmerchant inventory systems. The second component is a suite ofcomponents to make building clients to individual inventory systemseasy. The third component is a set of individual clients (which arebuilt using these components to run inside the framework) for accessingspecific merchant inventory systems (i.e. individual big box retailerslike Target, Best Buy, etc., as well as aggregate sources like Volusionor MerchantOS, and small merchant sources, such as Quickbooks orMicrosoft Dynamic).

The RTPAL system has a cache system that uses ZVOTs (Zip-Code,Variation, Offer tuples) as cache keys. Specifically, the cache key usedin querying the cache includes three components, a zip code relevant tothe query, an offer identifier corresponding with the specific offer orproduct, and a variation identifier specifying or indicating theparticular variant of the product or offering. The offer identifier isessentially synonymous with a product identifier, and uniquelyidentifies at a top level a particular product or item that is beingoffered for sale. A variation is a set of product specificcharacteristics. For example, for clothing, the variation may specifysuch characteristics as size and color, etc. With other products, othervariations are possible. For instance, with a tablet computer, avariation may specify the amount of member (16 GB, 32 GB, 64 GB, etc.)included with the computer. The zip code is used to specify the zip codeof relevance to the search. For instance, if a user is looking for aproduct in a particular zip code, the specified zip code can be used toquery the cache and ensure only relevance cache information is returned.In other embodiments, the system might implement fuzzy geographic-basedcaching in order to drastically increase the cache hit rate and tosupport significantly higher traffic volumes.

As illustrated in FIG. 2, an example of a three-tuple (e.g., ZVOT) cachekey for querying a cache is shown. For example, the cache key withreference 30 scores a cache hit with the cache entry having referencenumber 32 when used to query the cache entries on the right withreference number 34. By including the zip code in the cache key, thosecache entries that are geographically relevant to a particular user'sproduct query can be returned and presented with minimal processingdelay.

In addition to using the RTPAL and the structured data mining techniquesdescribed above, with some embodiments, product inventory information isreceived from third party sources via a simple data feed. Accordingly,FIG. 3 illustrates a block diagram of the various data sources fromwhich product inventory information can be obtained. For instance, thedata sources 40 include data feeds 42, application programminginterfaces (APIs) 44 for accessing merchant-specific product inventorysystems, and structured data mining of third-party websites 46. Asillustrated in FIG. 3, the data obtaining module 48 facilitates thereceiving of the product inventory information from the data feeds 42and the RTPAL-based APIs 44, while the structured data mining module 50facilitates the real-time receipt of information from third-partywebsites. As described in greater detail below, once obtained, the dateis stored in a product inventory database 52 and enhanced, for example,by the product offering matching module 54.

Automated Product Matching

With some embodiments, automated product matching is performed by aproduct offering matching module 54 (FIG. 3). The goal of productmatching is to aggregate offers for the same product to enhance the userexperience. When data is collected, all attempts are made to captureunique product identifiers such as: UPC, EAN, ASIN, ISBN, SKU and ModelNumber. When one or more of the above identifiers are available, arule-based algorithm is evoked to determine if the offer matches anexisting product. If a match is achieved, the product or offer isassigned to the matching product. If none of the above identifiers areavailable, other attributes of the offer, for example title,description, brand and specifications are used to determine a similarityscore with respect to one or more existing products. If a match isfound, the matching product is assigned, and if not, a new product iscreated.

Automated Product Categorization

With some embodiments of the invention, a product type taxonomy is used.For instance, with some embodiments, the taxonomy may consist ofapproximately three-thousand (3000) unique categories andsub-categories, arranged as nodes of a tree-like hierarchical structure.Approximately twenty-six hundred (2600) of these unique categories maybe leaf nodes. An example of a leaf node would be: Vehicle GPS Units.The aim of categorization is to ensure that every product offer has atleast one category node assigned to it.

With some embodiments, labelled offers are collected. These labelledoffers are used as training data in a machine learning algorithm, whichthen classifies the remaining unlabeled offers. The classificationalgorithm is a hybrid of variations on several different classicalgorithms: Naive Bayes, Rocchio, and kNN. With some embodiments,precisions vary by category and are typically upwards of 0.9. Overallprecision may be upwards of 0.96. With some embodiments, approximately80% of active offers can be classified with the automated categorizationsystem.

The various operations of example methods described herein may beperformed, at least partially, by one or more processors that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors may constitute processor-implemented modulesor objects that operate to perform one or more operations or functions.The modules and objects referred to herein may, in some exampleembodiments, comprise processor-implemented modules and/or objects.

Similarly, the methods described herein may be at least partiallyprocessor-implemented. For example, at least some of the operations of amethod may be performed by one or more processors orprocessor-implemented modules. The performance of certain operations maybe distributed among the one or more processors, not only residingwithin a single machine or computer, but deployed across a number ofmachines or computers. In some example embodiments, the processor orprocessors may be located in a single location (e.g., within a homeenvironment, an office environment or at a server farm), while in otherembodiments the processors may be distributed across a number oflocations.

The one or more processors may also operate to support performance ofthe relevant operations in a “cloud computing” environment or within thecontext of “software as a service” (SaaS). For example, at least some ofthe operations may be performed by a group of computers (as examples ofmachines including processors), these operations being accessible via anetwork (e.g., the Internet) and via one or more appropriate interfaces(e.g., Application Program Interfaces (APIs)).

FIG. 4 is a block diagram of a machine in the form of a computer systemwithin which a set of instructions, for causing the machine to performany one or more of the methodologies discussed herein, may be executed.In alternative embodiments, the machine operates as a standalone deviceor may be connected (e.g., networked) to other machines. In a networkeddeployment, the machine may operate in the capacity of a server or aclient machine in a client-server network environment, or as a peermachine in peer-to-peer (or distributed) network environment. In apreferred embodiment, the machine will be a server computer, however, inalternative embodiments, the machine may be a personal computer (PC), atablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), amobile telephone, a web appliance, a network router, switch or bridge,or any machine capable of executing instructions (sequential orotherwise) that specify actions to be taken by that machine. Further,while only a single machine is illustrated, the term “machine” shallalso be taken to include any collection of machines that individually orjointly execute a set (or multiple sets) of instructions to perform anyone or more of the methodologies discussed herein.

The example computer system 1500 includes a processor 1502 (e.g., acentral processing unit (CPU), a graphics processing unit (GPU) orboth), a main memory 1501 and a static memory 1506, which communicatewith each other via a bus 1508. The computer system 1500 may furtherinclude a display unit 1510, an alphanumeric input device 1517 (e.g., akeyboard), and a user interface (UI) navigation device 1511 (e.g., amouse). In one embodiment, the display, input device and cursor controldevice are a touch screen display. The computer system 1500 mayadditionally include a storage device 1516 (e.g., drive unit), a signalgeneration device 1518 (e.g., a speaker), a network interface device1520, and one or more sensors 1521, such as a global positioning systemsensor, compass, accelerometer, or other sensor.

The drive unit 1516 includes a machine-readable medium 1522 on which isstored one or more sets of instructions and data structures (e.g.,software 1523) embodying or utilized by any one or more of themethodologies or functions described herein. The software 1523 may alsoreside, completely or at least partially, within the main memory 1501and/or within the processor 1502 during execution thereof by thecomputer system 1500, the main memory 1501 and the processor 1502 alsoconstituting machine-readable media.

While the machine-readable medium 1522 is illustrated in an exampleembodiment to be a single medium, the term “machine-readable medium” mayinclude a single medium or multiple media (e.g., a centralized ordistributed database, and/or associated caches and servers) that storethe one or more instructions. The term “machine-readable medium” shallalso be taken to include any tangible medium that is capable of storing,encoding or carrying instructions for execution by the machine and thatcause the machine to perform any one or more of the methodologies of thepresent invention, or that is capable of storing, encoding or carryingdata structures utilized by or associated with such instructions. Theterm “machine-readable medium” shall accordingly be taken to include,but not be limited to, solid-state memories, and optical and magneticmedia. Specific examples of machine-readable media include non-volatilememory, including by way of example semiconductor memory devices, e.g.,EPROM, EEPROM, and flash memory devices; magnetic disks such as internalhard disks and removable disks; magneto-optical disks; and CD-ROM andDVD-ROM disks.

The software 1523 may further be transmitted or received over acommunications network 1526 using a transmission medium via the networkinterface device 1520 utilizing any one of a number of well-knowntransfer protocols (e.g., HTTP). Examples of communication networksinclude a local area network (“LAN”), a wide area network (“WAN”), theInternet, mobile telephone networks, Plain Old Telephone (POTS)networks, and wireless data networks (e.g., Wi-Fi® and WiMax® networks).The term “transmission medium” shall be taken to include any intangiblemedium that is capable of storing, encoding or carrying instructions forexecution by the machine, and includes digital or analog communicationssignals or other intangible medium to facilitate communication of suchsoftware.

Although an embodiment has been described with reference to specificexample embodiments, it will be evident that various modifications andchanges may be made to these embodiments without departing from thebroader spirit and scope of the invention. Accordingly, thespecification and drawings are to be regarded in an illustrative ratherthan a restrictive sense. The accompanying drawings that form a parthereof, show by way of illustration, and not of limitation, specificembodiments in which the subject matter may be practiced. Theembodiments illustrated are described in sufficient detail to enablethose skilled in the art to practice the teachings disclosed herein.Other embodiments may be utilized and derived therefrom, such thatstructural and logical substitutions and changes may be made withoutdeparting from the scope of this disclosure. This Detailed Description,therefore, is not to be taken in a limiting sense, and the scope ofvarious embodiments is defined only by the appended claims, along withthe full range of equivalents to which such claims are entitled.

What is claimed is:
 1. A method comprising: providing a web-basedcrawler configuration application to client device; receiving, via theweb-based crawler configuration application, a selection of an elementof a web page displayed at the client device; identifying, in a sourcedocument of the web page, an element of product inventory informationcorresponding to the selected element of the web page; and generating acrawler configuration file based on the identified element of productinventory information, the crawler configuration file configuring anoperation of a crawler application.
 2. The method of claim 1, furthercomprising: executing a first set of instructions representing aninstance of the crawler application, the instance of the crawlerapplication configured to perform a set of operations specified in thecrawler configuration file, the set of operations resulting in aretrieval of product inventory information for one or more productshosted at one or more web servers; executing a second set ofinstructions representing an instance of a real-time productavailability lookup (RTPAL) application, the RTPAL application to useone or more application programming interfaces (APIs) to request andreceive product inventory information from one or more third-partynetwork-connected inventory management systems; enhancing the productinventory information from the crawler application with the productinventory information from the RTPAL application; and storing theenhanced product inventory information in the database.
 3. The method ofclaim 2, further comprising: subsequent to receiving product inventoryinformation as a result of executing either of the crawler applicationand the RTPAL application, determining that received product inventoryinformation for a particular product does not specify a unique productidentifier; performing a matching operation by comparing variouselements of information concerning the particular product withcorresponding information from one or more known products to determine aproduct with which the received product inventory information for theparticular product not specifying the unique product identifier bestmatches; and storing the received product inventory information for theparticular product not specifying the unique product identifier in thedatabase.
 4. The method of claim 2, further comprising: storing productinventory information for a particular product received via the crawlerapplication or the RTPAL application in a data cache with each cacheentry having a cache key based on a zipcode for a location at which theparticular product received via the crawler application or the RTPALapplication is available, a product identifier, and a product variationidentifier.
 5. The method of claim 2, further comprising: assigning toeach product identified in the received product inventory information asa result of executing either of the crawler application and the RTPALapplication one or more category identifiers for a particular categoryin a hierarchical category.
 6. The method of claim 1, furthercomprising: receiving a search query from a user, the search queryincluding information identifying a desired geographical area; andprocessing the search query to provide product inventory information fora particular product satisfying the search query, the product inventoryinformation for the particular product satisfying the search queryspecifying one or more merchant stores in a geographical locationsatisfying the desired geographical area identified in the query, andindicating a quantity of the particular product satisfying the searchquery available at each of the one or more merchant stores.
 7. Themethod of claim 1, wherein the web-based crawler configurationapplication enables the user to specify one of a suite of web crawlersfor use with a particular crawler configuration file generated by theweb crawler configuration application.
 8. The method of claim 1, whereinthe crawler configuration identifies one crawler from a plurality ofcrawlers to be used with the crawler configuration.
 9. The method ofclaim 1, wherein the web-based crawler configuration application enablesthe user to invoke a code editing application to specify customized codefor obtaining a particular element of product inventory information, thecustomized code for inclusion in a crawler configuration file for usewith a particular crawler.
 10. The method of claim 1, wherein theweb-based crawler configuration application defines a set of selectors,each selector describing how to extract a single item of information forinsertion into the database.
 11. A server comprising: one or moreprocessors for executing one or more sets of instructions stored in amemory, the one or more set of instructions comprising: providing aweb-based crawler configuration application to client device; receiving,via the web-based crawler configuration application, a selection of anelement of a web page displayed at the client device; identifying, in asource document of the web page, an element of product inventoryinformation corresponding to the selected element of the web page; andgenerating a crawler configuration file based on the identified elementof product inventory information, the crawler configuration fileconfiguring an operation of a crawler application.
 12. The server ofclaim 11, wherein the one or more sets of instructions further comprise:executing a first set of instructions representing an instance of thecrawler application, the instance of the crawler application configuredto perform a set of operations specified in the crawler configurationfile, the set of operations resulting in a retrieval of productinventory information for one or more products hosted at one or more webservers; executing a second set of instructions representing an instanceof a real-time product availability lookup (RTPAL) application, theRTPAL application to use one or more application programming interfaces(APIs) to request and receive product inventory information from one ormore third-party network-connected inventory management systems;enhancing the product inventory information from the crawler applicationwith the product inventory information from the RTPAL application; andstoring the enhanced product inventory information in the database. 13.The server of claim 12, wherein the one or more sets of instructionsfurther comprise: subsequent to receiving product inventory informationas a result of executing either of the crawler application and the RTPALapplication, determining that received product inventory information fora particular product does not specify a unique product identifier;performing a matching operation by comparing various elements ofinformation concerning the particular product with correspondinginformation from one or more known products to determine a product withwhich the received product inventory information for the particularproduct not specifying the unique product identifier best matches; andstoring the received product inventory information for the particularproduct not specifying the unique product identifier in the database.14. The server of claim 12, wherein the one or more sets of instructionsfurther comprise: storing product inventory information for a particularproduct received via the crawler application or the RTPAL application ina data cache with each cache entry having a cache key based on a zipcodefor a location at which the particular product received via the crawlerapplication or the RTPAL application is available, a product identifier,and a product variation identifier.
 15. The server of claim 11, whereinthe one or more sets of instructions further comprise: receiving asearch query from a user, the search query including informationidentifying a desired geographical area; and processing the search queryto provide product inventory information for a particular productsatisfying the search query, the product inventory information for theparticular product satisfying the search query specifying one or moremerchant stores in a geographical location satisfying the desiredgeographical area identified in the query, and indicating a quantity ofthe particular product satisfying the search query available at each ofthe one or more merchant stores.
 16. The server of claim 11, wherein theweb-based crawler configuration application enables the user to specifyone of a suite of web crawlers for use with a particular crawlerconfiguration file generated by the web crawler configurationapplication.
 17. The server of claim 11, wherein the crawlerconfiguration identifies one crawler from a plurality of crawlers to beused with the crawler configuration.
 18. The server of claim 11, whereinthe web-based crawler configuration application enables the user toinvoke a code editing application to specify customized code forobtaining a particular element of product inventory information, thecustomized code for inclusion in a crawler configuration file for usewith a particular crawler.
 19. The server of claim 11, wherein theweb-based crawler configuration application defines a set of selectors,each selector describing how to extract a single item of information forinsertion into the database.
 20. A machine-readable storage mediumstoring instructions thereon, which, when executed by a processor of aserver, will cause the server to perform a set of operations comprising:providing a web-based crawler configuration application to clientdevice; receiving, via the web-based crawler configuration application,a selection of an element of a web page displayed at the client device;identifying, in a source document of the web page, an element of productinventory information corresponding to the selected element of the webpage; and generating a crawler configuration file based on theidentified element of product inventory information, the crawlerconfiguration file configuring an operation of a crawler application.